1 Introduction

In this tutorial we will go through some of the possibilities offered by ‘ggtree’ to visualize phylogenetic trees, format them and add additional information to the plot.

ggtree’ is an extension of ggplot2 and, as such, it offers many possibilities to format, manipulate and map data to phylogenetic trees.

Additional information available in: https://bioconductor.org/packages/release/bioc/html/ggtree.html

Additional tutorials and examples:

https://yulab-smu.top/treedata-book/index.html

https://xiayh17.gitee.io/treedata-book/index.html

https://github.com/GuangchuangYu/ggtree-current-protocols

https://github.com/fionarhuang/TreeHeatmap

References:

  • G Yu. Using ggtree to visualize data on tree-like structures. Current Protocols in Bioinformatics, 2020, 69:e96. doi: 10.1002/cpbi.96.

  • LG Wang, TTY Lam, S Xu, Z Dai, L Zhou, T Feng, P Guo, CW Dunn, BR Jones, T Bradley, H Zhu, Y Guan, Y Jiang, G Yu*. treeio: an R package for phylogenetic tree input and output with richly annotated and associated data. Molecular Biology and Evolution. 2020, 37(2):599-603. doi: 10.1093/molbev/msz240.

  • G Yu, TTY Lam, H Zhu, Y Guan. Two methods for mapping and visualizing associated data on phylogeny using ggtree. Molecular Biology and Evolution. 2018, 35(2):3041-3043. doi: 10.1093/molbev/msy194.

  • G Yu, DK Smith, H Zhu, Y Guan, TTY Lam*. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods in Ecology and Evolution. 2017, 8(1):28-36. doi: 10.1111/2041-210X.12628.

1.1 Installing required packages

In order to install ‘ggtree’, you need BiocManager. Additionally, we will also use ‘ape’ (Analyses of Phylogenetics and Evolution).

#if (!require("BiocManager", quietly = TRUE))
#    install.packages("BiocManager")

#if (!require("ggtree", quietly = TRUE))
#  BiocManager::install("ggtree")

1.2 Loading packages

library(ggtree)
library(ggtreeExtra)
library(ape)
library(ggplot2)
library(ggtext)
library(ggforce)
library(reshape2)
library(ggimage)
library(grid)
library(tidytree)
library(phylobase)
library(ggstance)
library(ggjoy)
library(dplyr)

1.3 Set your working directory

In my working directory, I have a folder for phylogenetic trees (“trees”), data (“data”), output figures (“figures”) and some microscopic photos (“photos”).

setwd("F:/Home_office_2020-03-16/Trichosporonales_project/Courses/ggtree_workshop/")

2 Load session

load("workshop_ggtree_session.RData")

3 Loading data

We start by importing a phylogenetic tree, previously calculated using RAxML. I will call it “tree”. Additionally, we will also import some data we have available for our dataset (“species_info”). This information can be numeric (e.g. size, temperature, etc) or categorical (e.g. genus name, host plant, accession number, etc). We will combine and visualize this information later, together with the phylogeny.

Important: any data imported (e.g. species_info) must be converted into a data frame. Otherwise, the mapping will not work.

tree = read.tree("trees/RAxML.tree")

species_info = read.delim("data/species_info.txt")
species_info = as.data.frame(species_info)

The phylogeny and data will be combined based on taxa IDs. So, make sure the IDs are the same in all datasets. The species order in the file, does not need to be the same.

Taxa included in the phylogeny and in the dataset do not need to be the same. Only the species present both in the tree and species_info will be considered. Likewise, species missing in one of the datasets will be ignored.

This means, if you have information for several species which are not included in the phylogeny, these data will be ignored (you do not need to create a subset of your data). If you have species in the phylogeny, for which you do not have information, these species will be present in the phylogeny, but no data will be added.

This is a tab delimited table. For each species, I have the ID (same as in the phylogeny), the genus and species, the strain ID (may be the same or different from the first ID), the “host” from which they were collected, the “coolness” of our species and an accession number (e.g. for ITS).

head(species_info)
##   ID   Genus species Genus_species strain    host coolness  AccNumb
## 1 A1 Species     red   Species red     A1 Plant A     cool LR881294
## 2 A2 Species     red   Species red     A2 Plant B     cool LR881297
## 3 A3 Species     red   Species red     A3 Plant B     cool LR881298
## 4  B Species    blue  Species blue      B Plant C     cool LR881299
## 5  C Species   green Species green      C Plant C not cool LR881302
## 6  D Species   black Species black      D Plant C not cool LR881303

3.1 Data manipulation

We can manipulate the data to include or combine different variables. For example, we can automatically abbreviate the genus name (while avoiding having different genera with the same abbreviation) and further combine it with the species name. This is useful to avoid having long and repetitive names and to edit the names manually, which sometimes can introduce errors (e.g. same abbreviation for 2 different names).

species_info$Gen = abbreviate(species_info$Genus, minlength = 2, dot = T, method = "left.kept")
species_info$Gen_species = paste(species_info$Gen, species_info$species)
head(species_info)
##   ID   Genus species Genus_species strain    host coolness  AccNumb Gen
## 1 A1 Species     red   Species red     A1 Plant A     cool LR881294 Sp.
## 2 A2 Species     red   Species red     A2 Plant B     cool LR881297 Sp.
## 3 A3 Species     red   Species red     A3 Plant B     cool LR881298 Sp.
## 4  B Species    blue  Species blue      B Plant C     cool LR881299 Sp.
## 5  C Species   green Species green      C Plant C not cool LR881302 Sp.
## 6  D Species   black Species black      D Plant C not cool LR881303 Sp.
##   Gen_species
## 1     Sp. red
## 2     Sp. red
## 3     Sp. red
## 4    Sp. blue
## 5   Sp. green
## 6   Sp. black

4 Plotting the tree

The default tree is just the skeleton and does not include any additional information. Everything in addition will need to be requested.

ggtree(tree)

You have many different possible layouts and styles. These include rectangular trees, rounded corners, circular, dendograms, networks, etc. You may also rotate and change angles and whether to consider branch lengths or not. All options are available in the description of the ggtree.

ggtree(tree)+
  ggtree(tree, layout="roundrect")+
  ggtree(tree, layout="slanted")+
  ggtree(tree, layout="ellipse")+
  ggtree(tree, layout="circular")+
  ggtree(tree, layout="fan", open.angle=120)+
  ggtree(tree, layout="equal_angle")+
  ggtree(tree, layout="daylight")+
  ggtree(tree, branch.length='none')+
  ggtree(tree, layout="ellipse", branch.length="none")+
  ggtree(tree, branch.length='none', layout='circular')+
  ggtree(tree, layout="daylight", branch.length = 'none')

5 Format branches

The default tree looks like this:

ggtree(tree) 

We will start by increase the thickness of the branches

ggtree(tree, size = 1.5) 

We may also change the type of line.

ggtree(tree, size = 1.5, linetype="dotted")

5.1 Colour branches

And choose a specific colour for the lines.

ggtree(tree, size = 1.5, colour="red", linetype="dotted") 

We can also colour the branches according to data. In this case, we will colour according to “branch length”. You may also choose more useful information, such as mutation rates, number of positively selected genes, molecular clocks, etc.

ggtree(tree, branch.length = "branch.length", size = 1.5, aes(color=branch.length)) +
  scale_color_continuous(low='dodgerblue2', high='red2') +
  theme(legend.position="right")+
  labs(color='Branch length') 

5.2 colour clades

We may also colour according to groups (e.g. genus). For this, we define a group of clades, based on node ID (we will learn later how to find out the node IDs) and colour branches by group.

groups <- groupClade(tree, .node=c(51, 36))

ggtree(groups, aes(color=group), size = 1.5)+
scale_color_discrete(name = "Genus", labels = c("Genus A", "Genus B", "Genus C"))

5.3 save formatted tree as ‘p

Similar to ggplot, we can save the formatted tree as a variable in R, so we can later easily visualize it or add things to ‘p’ without having to run the code again. We will save the following tree as ‘p’:

p = ggtree(tree, size = 1)
p

6 Adding scale to tree

The default plot does not include a scale. This is important to add, to give sense to the branch lengths. Since it is not drawn by default, we need to specify it.

p + geom_treescale()

6.1 Positioning the scale bar (based on coordinates)

We can adjust the position of the scale, based on x/y coordinates. The tree will adjust itself to the new position of the scale.

p + geom_treescale(x=0.5, y=0)

p + geom_treescale(x=0, y=30)

p + geom_treescale(x=0.8, y=30)

p + geom_treescale(x=0.8, y=15)

p + geom_treescale(x=0.8, y=0)

p + geom_treescale(x=0, y=-1)

6.2 Formatting the scale bar and value

We can also format the font size, thickness of the line, colours of the text and line, etc.

p + geom_treescale(x=0, y=-1, fontsize = 3)

p + geom_treescale(x=0, y=-1, fontsize = 3, linesize=1)

6.3 save formatted tree

Once you find the right format, save it as ‘p’.

p=p + geom_treescale(x=0, y=-1, fontsize = 3, linesize=1)
p

7 Adding species names (tip labels)

By default, the names of the taxa are also not included. This is another vital information we need to add.

p + geom_tiplab()

7.1 Aligning species names

In some cases, it might be useful to align the species names on the right. The dots leading to the names are automatic.

p + geom_tiplab(align = TRUE) 

For the remaining tutorial, I will continue using the names unaligned and next to the branch.

p + geom_tiplab() 

8 Adding data from ‘species_info’ to the tree

currently, we have the default IDs of the phylogenetic tree. We can replace these by any information we have available on the species_info dataset. We will add now the genus and species names.

This is done by providing the tree plot (p) and combine it with the dataset we want to map (p %<+% species_info). Note the special symbol (%<+%). This symbol adds annotation data to a tree.

p %<+% species_info + 
  geom_tiplab(aes(label = factor(Genus_species)))

8.1 Comprise the graph, so the complete labels are visible

The names are too long for the plot. Since we have very long branches, We can comprise the tree to make it more compact, while preserving the branch length information and adjusting the scale, accordingly.

p %<+% species_info + 
  geom_tiplab(aes(label = factor(Genus_species)))  +
  xlim(0,1)

9 Format the species names

9.1 Italicize the species names

Usually, species names should be in italic. We can easily format the names as well.

p %<+% species_info + 
  geom_tiplab(aes(label = factor(Genus_species), fontface = "italic"))  +
  xlim(0,1)

9.2 Colour the species names

We may also change the colour and size, to our preference and needs.

p %<+% species_info + 
  geom_tiplab(aes(label = factor(Genus_species), fontface = "italic"), colour = "blue", size = 5)  +
  xlim(0,1)

9.3 Colour the species names by genus

Or we can colour the names according to relevant information. In this case, we will colour based on genus names (present in species_info dataset)

p %<+% species_info + 
  geom_tiplab(aes(label = factor(Genus_species), fontface = "italic", colour = Genus), size = 5)  +
  xlim(0,1)

9.4 Combine information on labels

We may also combine different information on the labels. For example, genus name, species name and strain ID. Each can be formatted differently.

p %<+% species_info + 
  geom_tiplab(aes(label=paste0('italic(', Genus, ')~italic(', species, ')~bold(', strain, ')')), parse=T)+
  xlim(0,1)

p %<+% species_info + 
  geom_tiplab(aes(label=paste0('italic(', Gen, ')~italic(', species, ')~bold(', strain, ')')), parse=T)+
  xlim(0,1)

9.5 Add accession numbers

We may also combine information with custom text. In this case, we print the accession numbers between parentheses.

p %<+% species_info + 
  geom_tiplab(aes(label=paste0('italic(', Gen, ')~italic(', species, ')~bold(', strain, ')~plain(', "(", AccNumb, ")",')')), parse=T)+
  xlim(0,1)

9.6 Add hosts

Or we can, for example, also add the hosts where the strains were isolated from. In this case, we combine species information, custom text (“in host”) and individual formatting for each (plain, bold and italic). We might need to adjust the space between factors with the ‘offset’ parameter.

 p %<+% species_info + 
  geom_tiplab() + 
  geom_tiplab(aes(label = "in host", fontface = "bold"), offset = 0.04) + 
  geom_tiplab(aes(label = factor(host), fontface = "italic"), offset=0.14)+
  xlim(0,1)   

9.7 Saving some of the trees for later

# Full genus name
p2 = p %<+% species_info + 
  geom_tiplab(aes(label=paste0('italic(', Genus, ')~italic(', species, ')~bold(', strain, ')')), parse=T)+
  xlim(0,1)

# Abbreviated genus name
p3 = p %<+% species_info + 
  geom_tiplab(aes(label=paste0('italic(', Gen, ')~italic(', species, ')~bold(', strain, ')~plain(', "(", AccNumb, ")",')')), parse=T)+
  xlim(0,1)

10 Adding bootstrap

Another important missing information, is the node support or bootstrap.

p2

The default placement and formatting might not be the best:

p2 + geom_nodelab()

10.1 Formatting bootstrap

We have many options to format the text, but we’re mainly interested in changing its font size

p2 + geom_nodelab(size=2.5)

and it’s relative position to the nodes.

p2 + geom_nodelab(size=2.5, hjust = 1.5, vjust=-0.5)

10.2 save tree

Now we save the tree as ‘p4

p4 = p2 + geom_nodelab(size=2.5, hjust = 1.5, vjust=-0.5)

11 Find node numbers

Each node has a unique ID number. We can address these node numbers to apply additional formatting (e.g. colours) to clades deriving from this specific node.

ggtree(tree) %<+% species_info + 
  geom_text(aes(label=node), hjust=-.3, size=3)

12 Formatting clades

Now we will format some clades.

p4

We can flip/rotate branches and clades. In this case, we specify that we flip (change the order) node 36 and node 51.

flip(tree_view = p4, node1 = 36, node2 = 51)

13 Reroot

By using node numbers, we can also specify new roots to the tree. We now selected node 51 as the new tree.

In this case, we need to recompute the previous steps. We cannot flip an existing phylo object (i.e. a tree saved as a plot). We need to reroot the tree we read in the beginning (tree).

We can select the new root with root() and by specifying the node number. In addition, we draw already the new tree (by including all commands within ggtree().

rerooted_tree = ggtree(root(tree, node = 51, edgelabel = TRUE), size=1) +
  geom_treescale(x=0, y=-1, fontsize = 3, linesize=1) 

Now, we can remap the data to this rerooted tree, as before.

rerooted_tree %<+% species_info + 
  geom_tiplab(aes(label=paste0('italic(', Genus, ')~italic(', species, ')~bold(', strain, ')')), parse=T)+
  geom_nodelab(size=2.5, hjust = 1.5, vjust=-0.5)+
  xlim(0,1)

14 Remove species from phylogeny

We have several strains for the same species (e.g. A1, A2, A3).

p4

We may remove taxa from the phylogeny. The branch lengths will be recalculated to reflect this, so the tips have the same total length, from the last common node.

We can specify which taxa we want to drop, by listing their IDs. In this case we will exclude multiple strains for the same species.

Then, we will plot the resulting tree and format it, as before.

edited_tree = drop.tip(tree, c("N2", "N3", "R2", "U2", "U3", "A2", "A3", "P2", "I2"))

edited_tree_p = ggtree(edited_tree, size = 1) +
  geom_treescale(x=0, y=-1, fontsize = 3, linesize=1) 

edited_tree_p %<+% species_info + 
  geom_tiplab(aes(label=paste0('italic(', Genus, ')~italic(', species, ')~bold(', strain, ')')), parse=T)+
  geom_nodelab(size=2.5, hjust = 1.5, vjust=-0.5)+
  xlim(0,1)

14.1 Save tree

We can save and export the edited tree. This will be in ‘newick’ format, just like our original output from RAxML.

write.tree(edited_tree, "trees/my_edited.tree")

15 Annotating clades

We might need to annotate some clades.

p2 + geom_nodelab(size=2.5, hjust = 1.5, vjust=-0.5)

We will delimit a clade, based on node number, and add a label to it. The labels may be formatted independentely.

p5=p2 + geom_nodelab(size=2.5, hjust = 1.5, vjust=-0.5)+xlim(0,1.3)+
  geom_cladelabel(node = 51, label = "Species clade", barsize = 2, align = F, offset = 0.3, offset.text = 0.02, horizontal = F, angle = -90, hjust=0.5)+
  geom_cladelabel(node = 36, label = "Something clade", barsize = 2, align = F, offset = 0.35, offset.text = 0.01, horizontal = TRUE)  
p5

15.1 collapse clades

Additionally, we might want to collapse monophyletic clades which we might not need to fully represent. This is accomplished by node number, as well. Each collapsed group, can have different filling colours. “Fill” addresses the filling colour, and “colour” the contour of the shape. Additional formatting parameters might be added (e.g. size, alpha, etc)

p5 %>% collapse(node = 61, 'max', fill = "black") %>%
  collapse(node = 55, 'max', fill = "grey", colour = "black")

We can label these collapsed clades, as well and format them accordingly.

p5 %>% collapse(node = 61, 'max', fill = "black") %>%
  collapse(node = 55, 'max', fill = "grey", colour = "black") + 
  geom_cladelabel(node = 61, label = "Collapsed group 1", barsize = 0, align = F, offset = 0.13, horizontal = T, fontface = "bold", colour = "black")+
  geom_cladelabel(node = 55, label = "Collapsed group 2", barsize = 0, align = F, offset = 0.23, horizontal = T, fontface = "bold", colour = "azure4")

15.2 Extracting clades based on node number

Specific clades can be extracted and formatted from the original tree. This is based on node number, as well.

clade_57 = viewClade(ggtree(tree, color="red", linetype="dotted", size=2), node = 57)+
  geom_tiplab()+geom_nodelab(size=2.5, hjust = 1.5, vjust=-0.5)+
  ggtitle("subset of the Species clade")
clade_57

clade_37 = viewClade(ggtree(tree, color="blue", size = 2), node = 37)+
  geom_tiplab()+geom_nodelab(size=2.5, hjust = 1.5, vjust=-0.5)+
  ggtitle("subset of the Something clade")
clade_37

15.3 Plotting the 3 trees

We may plot different trees or clades together in the same figure. This can be done by different packages.

p6=p4 + geom_highlight(node = 57, alpha = 0.2, fill="red") + 
  geom_highlight(node = 37, alpha = 0.2, fill="blue") + 
  xlim(0,1.5) + 
  ggtitle("Best phylogeny ever")

p6 + clade_57 / clade_37

16 Adding images/photos to tree

When we are looking at specific groups and taxa, it might be useful to add some images to them.

clade_37

Sometimes, we might want to add macro or microscopic photos, a cartoon of the host, etc. These are mapped to the taxa ID, based on the image file name. The folder should contain images for every taxa (otherwise it will not work) and each file, should have the same “label” as the taxa. The size, scale and transparency of the figures can be formatted.

viewClade(ggtree(tree, color="blue", size = 2), node = 37, xmax_adjust = 0.3)+
  geom_nodelab(size=2.5, hjust = 1.5, vjust=-0.5)+
  geom_tiplab()+
  geom_tiplab(aes(image=paste0('photos/', label, '.jpg')), 
              geom="image", offset=0.05, size=0.15, alpha = 1)

17 Plotting information on the tips

Sometimes, adding information to the tips can be useful.

ggtree(tree, size = 1) %<+% species_info + 
  geom_tiplab(aes(label=paste0('italic(', Genus, ')~italic(', species, ')~bold(', strain, ')')), parse=T)+
  xlim(0,1)

This can include colouring the species names based on some variable information, such as coolness, origin, etc.

ggtree(tree, size = 1) %<+% species_info + 
  geom_tiplab(aes(color=coolness, label=paste0('italic(', Genus, ')~italic(', species, ')~bold(', strain, ')')), parse=T)+
  xlim(0,1)

Or adding a shape at the tip, and adding a colour code instead.

ggtree(tree, size = 1) %<+% species_info + 
  geom_tippoint(aes(color=coolness), size=3,position = position_nudge(x = 0.01))+
  geom_tiplab(aes(label=paste0('italic(', Genus, ')~italic(', species, ')~bold(', strain, ')')), parse=T, hjust = -0.1)+
  xlim(0,1)

Additionally, we can combine shapes and colour to combine different variables, like host and coolness.

ggtree(tree, size = 1) %<+% species_info + 
  geom_tippoint(aes(shape=host, color=coolness), size=3,position = position_nudge(x = 0.01))+
  geom_tiplab(aes(label=paste0('italic(', Genus, ')~italic(', species, ')~bold(', strain, ')')), parse=T, hjust = -0.1)+
  xlim(0,1)

We will now save and use the following tree and add additional information to it.

p7 = ggtree(tree, size=1) %<+% species_info + 
  geom_tippoint(aes(color=coolness), size=3, position = position_nudge(x = 0.01))+
  geom_tiplab(hjust = -1)

p7

18 Adding data

We will now import several independent tables and use them to add information in different panels to our phylogeny.

Any imported table MUST be specified as.data.frame. Otherwise, mapping will not be accurate and no warning will be produced.

We will import general genome statistics, number of genes, SNP location, genotypic information and results from STRUCTURE analyses (two different K values).

genome_stats = read.delim("data/genome_data.txt")
genome_stats = as.data.frame(genome_stats)

genome_stats2 = read.delim("data/genome_data2.txt")
genome_stats2 = as.data.frame(genome_stats2)

genes = read.delim("data/genes.txt")
genes = as.data.frame(genes)

SNP = read.delim("data/SNP.txt")
SNP = as.data.frame(SNP)

genotype = read.delim("data/genotype.txt", header = 1, row.names = 1)
genotype = as.data.frame(genotype)

structurek4 = read.delim("data/structure_K4.txt")
structurek4 = as.data.frame(structurek4)

structurek5 = read.delim("data/structure_K5.txt")
structurek5 = as.data.frame(structurek5)

19 colour palette with 25 colours

A nice colour scheme is always useful.

colours25 <- c("dodgerblue2", "#E31A1C", "green4", "#6A3D9A", "#FF7F00", "black", "gold1", "skyblue2", "#FB9A99", "palegreen2", "#CAB2D6", "#FDBF6F", "gray70", "khaki2", "maroon", "orchid1", "deeppink1", "blue1", "steelblue4", "darkturquoise", "green1", "yellow4", "yellow3", "darkorange4", "brown")

20 Add a heatmap / genotype

First, we will add a heatmap, with the genotype information.

gheatmap(p7, genotype, width=0.6, 
         colnames=TRUE, font.size=3, 
         colnames_angle=-45, hjust=0, offset = 0.1) +
  scale_x_ggtree() + 
  scale_y_continuous(expand=c(0, 1))

21 Add STRUCTURE info

To add structure information, we will need to format the matrix, into a 3-column table. This can achieved with ‘melt’ from reshape2 package. We will also add column names to the resulting table.

melt_structurek4 = melt(structurek4, id.vars = "Ind")
colnames(melt_structurek4) = c("id", "population", "value")

melt_structurek5 = melt(structurek5, id.vars = "Ind")
colnames(melt_structurek5) = c("id", "population", "value")
pstruct = p7 + geom_facet(panel ='K4', 
                          data = melt_structurek4, 
                          geom = geom_bar, 
                          stat = "identity", 
                          position = position_stack(reverse = TRUE),
                          aes(x = value, fill=population), 
                          orientation = 'y', 
                          width = .6) +
  scale_fill_manual(values=colours25)+
  theme_tree2()
pstruct

pstruct2 = pstruct + geom_facet(panel ='K5', 
                     data = melt_structurek5, 
                     geom = geom_bar, 
                     stat = "identity", 
                     position = position_stack(reverse = TRUE),
                     aes(x = value, fill=population), 
                     orientation = 'y', 
                     width = .6) +
  scale_fill_manual(values=colours25)+
  theme_tree2()+
  theme(legend.position = "")
pstruct2 

dev.off()
## null device 
##           1
print(pstruct2, vp=viewport(angle=90))
pdf("figures/structure.pdf")
print(pstruct2, vp=viewport(angle=90))
dev.off()
## png 
##   2

22 Adding a panel with boxplots

p7

p7 + geom_facet(panel ='Size', 
                data = genome_stats2, 
                geom = geom_boxploth, 
                aes(x = Chr_size/1000000, group=label, fill=coolness)) +
  theme_tree2()

23 Adding a panel with distribution curves

p7 + geom_facet(panel ='Distribution', 
                data = genome_stats2, 
                geom = geom_joy, 
                aes(x = Chr_size, group=label, fill=coolness),
                color='grey80', lwd=.3) +
  theme_tree2()

24 Adding a panel with a scaled circle

p7 + geom_facet(panel ='Genome size (Mbp)', 
                data = genome_stats, 
                geom = geom_point, 
                aes(x=0, size = Genome_size/1000000), 
                fill="black",
                orientation = 'y', 
                width = .6) +
  theme_tree2()

p7 + geom_facet(panel ='Genome size (Mbp)', 
                data = genome_stats, 
                geom = geom_point, 
                aes(x=0, size = Genome_size/1000000), 
                fill="black",
                orientation = 'y', 
                width = .6)+
  xlim(0,0.8)

25 Adding a panel with a geom_col

p7 + geom_facet(panel ='Genome size (Mbp)', 
                data = genome_stats2, 
                geom = geom_bar, 
                stat = "identity", 
                position = position_stack(reverse = TRUE),
                aes(x = Chr_size/1000000, fill=Chr), 
                orientation = 'y', 
                width = .6) +
  scale_fill_manual(values=colours25)+
  theme_tree2()

p8 = p7 + geom_facet(panel ='Genome size (Mbp)', 
                data = genome_stats, 
                geom = geom_col, 
                aes(x = Genome_size/1000000), 
                fill="black",
                orientation = 'y', 
                width = .6) +
     theme_tree2()
p8

26 Adding gene information

p9 = p8 + geom_facet(panel ='Proteome',
                data = genes, 
                geom = geom_bar, 
                stat = "identity", 
                position = 'stack',
                aes(x = genes, fill=SP),
                colour = "black",
                orientation = 'y', 
                width = .6) +
  theme_tree2()+
  theme(legend.position = "right", 
        axis.text = element_text(colour = "black"),
        axis.ticks = element_line(colour = "black"))+
  scale_fill_manual(values=c("Non-secreted proteins" = "darkgrey","Secreted proteins" = "white"))

p9

p10 = p9 + geom_facet(panel = "SNP", 
                data = SNP, 
                geom = geom_point, 
                mapping=aes(x = pos, color=coolness), 
                shape = '|', 
                size=5)+
  theme(legend.position = "bottom")+
  xlim_tree(1)
p10

p11=facet_widths(p10, c(Tree = 1, "Genome size (Mbp)" = 0.2, Proteome = 0.5, SNP = 2))
p11

pdf("figures/MyStudy.pdf", width = 13, height = 8)
p11
dev.off()
## png 
##   2
tiff("figures/MyStudy.tiff", width = 13, height = 8, res = 100, units = "in")
p11
dev.off()
## png 
##   2

27 Circular plot

pc = ggtree(tree, layout = 'circular', size=1)
pc

pc2 = pc %<+% species_info +
  geom_tippoint(aes(color=coolness), size=3,position = position_nudge(x = 0.01))+
  geom_tiplab(offset = 0.03)
pc2

pc3 = pc2 +
  geom_fruit(
  data=genome_stats,
  geom=geom_col,
  mapping = aes(x=Genome_size, y=ID),
  fill="black",
  orientation = 'y',
  offset = 0.2)
pc3 

pc4 = pc3 + 
  geom_fruit(
  data=genome_stats2,
  geom=geom_col,
  mapping = aes(x=Chr_size, y=ID, fill=Chr),
  orientation = 'y')+
  scale_fill_manual(values=colours25)
pc4

pc5 = pc3 + 
  geom_fruit(
  data=genes,
  geom = geom_bar, 
  mapping = aes(x = genes, y=id, fill=SP),
  colour = "black",
  orientation = 'y',
  stat="identity")+
  scale_fill_manual(values=c("Non-secreted proteins" = "darkgrey","Secreted proteins" = "black"))
pc5

28 Rotates the tree

rotate_tree(pc5, 265)

29 Adds highlight

rotate_tree(pc5, 265) + geom_highlight(node = 57, alpha = 0.3, fill="red") +
                        geom_highlight(node = 37, alpha = 0.3, fill="blue")

30 Adds a heatmap on a circular plot

pc6 = rotate_tree(pc2, 265) 
gheatmap(pc6, genotype, offset=0.12, width=0.6, colnames_offset_y = 0, font.size=3)

gheatmap(pc6, genotype, offset=0.12, width=0.6, colnames_offset_y = 0, font.size=3)+
  gheatmap(p7, genotype, width=0.6, 
           colnames=TRUE, font.size=3, 
           colnames_angle=-45, hjust=0, offset = 0.1) +
  scale_x_ggtree() + 
  scale_y_continuous(expand=c(0, 1)) 

31 Compare trees

tree1 = read.tree("trees/RAxML.tree")
tree2 = read.tree("trees/RAxML_gene2.tree")

p1 <- ggtree(tree1)
p2 <- ggtree(tree2)

p1 + geom_tiplab() + annotate("text", x=0.1, y=33, label = "Gene 1", fontface = "bold") +
  p2 + geom_tiplab() + annotate("text", x=0.2, y=33, label = "Gene 2", fontface = "bold")

d1 <- p1$data
d2 <- p2$data

## reverse x-axis and 
## set offset to make the tree on the right-hand side of the first tree
d2$x <- max(d2$x) - d2$x + max(d1$x) + 1

pp <- p1 + geom_tree(data=d2) +      
  ggnewscale::new_scale_fill() +
  geom_hilight(
    data = d2, 
    mapping = aes( 
      subset = node %in% c(38, 48, 58),
      node=node,
      fill=as.factor(node))) +
  labs(fill = "clades for tree in right" ) 

# Filter for connections between tips
dd = bind_rows(d1, d2) %>% 
  filter(!is.na(label))
dd = subset(dd, dd$isTip == "TRUE")

pp + 
  geom_tiplab( bg.colour = alpha('firebrick', .5)) +
  geom_tiplab(data = d2, hjust=1, 
              bg.colour = alpha('firebrick', .5))+
  annotate("text", x=0.1, y=33, label = "Gene 1", fontface = "bold")+
  annotate("text", x=2.4, y=33, label = "Gene 2", fontface = "bold")

pp + geom_line(aes(x, y, group=label), data=dd, color='grey') +
  geom_tiplab( bg.colour = alpha('firebrick', .5)) +
  geom_tiplab(data = d2, hjust=1, 
              bg.colour = alpha('firebrick', .5))+
  annotate("text", x=0.1, y=33, label = "Gene 1", fontface = "bold")+
  annotate("text", x=2.4, y=33, label = "Gene 2", fontface = "bold")

pp + geom_line(aes(x, y, group=label), data=dd, color='grey') +
  geom_tiplab( bg.colour = alpha('firebrick', .5)) +
  geom_tiplab(data = d2, hjust=1, 
              bg.colour = alpha('firebrick', .5))+
  annotate("text", x=0.1, y=33, label = "Gene 1", fontface = "bold")+
  annotate("text", x=2.4, y=33, label = "Gene 2", fontface = "bold")+
  geom_highlight(node = 56, alpha = 0.2, fill="red", extendto = 0.6)+
  geom_highlight(node = 47, alpha = 0.2, fill="blue", extendto = 0.75)

pp + geom_line(aes(x, y, group=label), data=dd, color='grey') +
  geom_tiplab( bg.colour = alpha('firebrick', .5)) +
  geom_tiplab(data = d2, hjust=1, 
              bg.colour = alpha('firebrick', .5))+
  annotate("text", x=0.1, y=33, label = "Gene 1", fontface = "bold")+
  annotate("text", x=2.4, y=33, label = "Gene 2", fontface = "bold")+
  geom_highlight(node = 56, alpha = 0.2, fill="red", extendto = 0.6)+
  geom_highlight(node = 47, alpha = 0.2, fill="blue", extendto = 0.75)+
  geom_taxalink(taxa1 = 47, taxa2 = 56, color="darkorange2", curvature=-.9, lwd=0.8, ncp = 10, linetype = "dashed", 
                arrow = arrow(length = unit(0.02, "npc"), 
                              type="closed",
                              ends = "both"))

pp + geom_line(aes(x, y, group=label), data=dd, color='grey') +
  geom_tiplab( bg.colour = alpha('firebrick', .5)) +
  geom_tiplab(data = d2, hjust=1, 
              bg.colour = alpha('firebrick', .5))+
  annotate("text", x=0.1, y=33, label = "Gene 1", fontface = "bold")+
  annotate("text", x=2.4, y=33, label = "Gene 2", fontface = "bold")+
  geom_highlight(node = 56, alpha = 0.2, fill="red", extendto = 0.6)+
  geom_highlight(node = 47, alpha = 0.2, fill="blue", extendto = 0.75)+
  geom_taxalink(taxa1 = 47, taxa2 = 56, color="darkorange2", curvature=-.9, lwd=0.8, ncp = 10, linetype = "dashed", 
                arrow = arrow(length = unit(0.02, "npc"), 
                              type="closed",
                              ends = "both"))+
  geom_taxalink(taxa1 = 37, taxa2 = 41, color="mediumblue", curvature=-.45, lwd=0.8, ncp = 10, offset = -0.04)

32 Other examples

library(gggenes) # used only to get the example dataset
head(example_genes)
##   molecule gene start   end  strand orientation
## 1  Genome1 genA 15389 17299 reverse           1
## 2  Genome1 genB 17301 18161 forward           0
## 3  Genome1 genC 18176 18640 reverse           1
## 4  Genome1 genD 18641 18985 forward           0
## 5  Genome1 genE 18999 20078 reverse           1
## 6  Genome1 genF 20086 20451 forward           1
get_genes <- function(data, genome) {filter(data, molecule == genome) %>% pull(gene)}
g <- unique(example_genes[,1])
n <- length(g)
d <- matrix(nrow = n, ncol = n)
rownames(d) <- colnames(d) <- g
genes <- lapply(g, get_genes, data = example_genes)

for (i in 1:n) {
  for (j in 1:i) {
    jaccard_sim <- length(intersect(genes[[i]], genes[[j]])) / 
      length(union(genes[[i]], genes[[j]]))
    d[j, i] <- d[i, j] <- 1 - jaccard_sim
  }
}

tree <- ape::bionj(d) 

ggtree(tree, branch.length='none') + 
  geom_tiplab() + xlim_tree(5.5)

ggtree(tree, branch.length='none') + 
  geom_tiplab() + xlim_tree(5.5) + 
  geom_facet(mapping = aes(xmin = start, xmax = end, fill = gene),
             data = example_genes, geom = geom_motif, panel = 'Alignment',
             on = 'genE', label = 'gene', align = 'left') +
  scale_fill_brewer(palette = "Set3") + 
  scale_x_continuous(expand=c(0,0)) +
  theme(strip.text=element_blank(),
        panel.spacing=unit(0, 'cm'),
        axis.text = element_text(size=15))

33 List of functions, arguments and parameters

Command Description Package Example
read.tree() Loads tree file. Creates a phylo object. ape tree = read.tree(“RAxML.tree”)
ggtree() Draws phylogenetic tree from phylo object ggtree p = ggtree(tree)
%<+%

34 Useful commands

Command Description Package Example
abbreviate() Abbreviates strings (e.g. words) to at least minlength (e.g. 4) characters, such that they remain unique.
save.image("workshop_ggtree_session.RData")